The State of Global Terrorism

An In-Depth Analysis of Trends and Threats

Author

Shreehar Joshi

Terrorism has been a constant hindrance on our effort to achieve global peace and prosperity. From hostage situations and hijackings to mass shootings and bombings, terrorist attacks have a profound impact on both the victims and the larger society; they cause physical harm and loss of life, as well as emotional trauma and psychological distress. Needless to say, they can have long-lasting socio-economic consequences, disrupting trade and commerce, causing job losses, and decreasing investor confidence.

As the frequency of terrorist attacks is increasing at a rate faster than ever, it is crucial to understand them and their trends and patterns. In this blog post, I will be examining various aspects of terrorism including regions, targets, methods, and motives using three open-source datasets: Global Terrorism Database (GTD), which contains information on over 180,000 global terrorist attacks from 1970 to 2017; World, Region, Country GDP/GDP per capita, which includes the GDP per Capita of different countries from 1960 to 2021; and the World Bank National Accounts data, which provides the information on the fertility rate and net migration of each country from 1955 to 2020.

I hope this project will shed some light on the phenomenon of global terrorism and will equip us better to combat them in the future. So let’s roll up our sleeves and demystify the world of global terrorism.

Analysis

Code
# Import modules
import pandas as pd
import numpy as np
import plotly.express as px
import nltk
from sklearn.metrics import mean_squared_error
from sklearn.ensemble import RandomForestRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn import neighbors
import tensorflow as tf
from PIL import Image
from tensorflow.keras.models import Sequential
from sklearn.model_selection import train_test_split
from tensorflow.keras.layers import Dense, Dropout, Conv1D, MaxPooling1D, Flatten, LSTM, SimpleRNN
from tensorflow.keras.layers import Bidirectional, GRU, UpSampling1D
import plotly.express as px
from sklearn.preprocessing import LabelEncoder
from wordcloud import WordCloud
import matplotlib.pyplot as plt
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib.lines import Line2D
import matplotlib.patches as mpatches
import time
import warnings
import bar_chart_race as bcr
warnings.filterwarnings("ignore", category=FutureWarning)

# Read first database
df_attacks = pd.read_csv("../data/globalterrorismdb_0718dist.csv", encoding="ISO-8859-1", low_memory=False)
df_attacks.head()
df_attacks = df_attacks[['eventid','iyear', 'imonth', 'iday', 'country_txt', 'region_txt', 
'provstate', 'city', 'latitude', 'longitude', 'suicide', 'attacktype1_txt', 'targtype1_txt', 
'gname', 'motive', 'weaptype1_txt', 'nkill']]
df_attacks.rename(columns={"eventid": "Event ID", "iyear": "Year", "imonth": "Month", 
"country_txt": "Country", "region_txt": "Region", "provstate": "Province/State", "city": "City", "latitude": "Latitude", 
"longitude": "Longitude", "suicide": "Suicide", "attacktype1_txt": "Attack Type",
"targtype1_txt": "Target Type", "gname": "Terrorist Group", "motive": "Motive", 
"weaptype1_txt": "Weapon Type", "nkill": "Casualties"}, inplace=True)

# Read second database
df_population = pd.read_csv("../data/population.csv")
df_population = df_population[["Country","Year", "Migrants(net)", "FertilityRate"]]
df_population.rename(columns= {"FertilityRate": "Fertility Rate", "Migrants(net)": "Migrants (net)"}, inplace=True)

# Read third database
df_gdp = pd.read_csv("../data/world_country_gdp_usd.csv")
df_gdp = df_gdp[['Country Name','year', 'GDP_USD']]
df_gdp.rename(columns= {"Country Name": "Country", "year": "Year", "GDP_USD":"GDP (in USD)", "GDP_per_capita_USD": "GDP (per capita)"}, inplace=True)

# Read database for the population of the US
df_us_population = pd.read_csv("../data/us_population.csv")
df_us_population = df_us_population[["state", "pop2022"]]
df_us_population.rename(columns= {"state": "State", "pop2022": "Population"}, inplace=True) 

# Show the terrorist attacks as a scatter animation
fig = px.scatter_geo(df_attacks, lon="Longitude", lat="Latitude", animation_frame="Year", color="Region",
                     projection="equirectangular", animation_group="Year", title="Terrorist Attacks (1970 - 2017)")
fig.update_layout(title_x=0.44)
fig.show()

Figure 1: Global Terrorist Attacks

The animation above shows that there were a significant number of terrorist attacks in the US from 1970 to 2017. It is surprising to see this, especially when we consider the effort the US has put over the past 50 years in tackling terrorism in almost every terrorist-prone country. Which states had the highest number of terrrosit attacks? Lets find out.

Code
# Map US states to their abbreviations
us_state_to_abbrev = {
    "Alabama": "AL",
    "Alaska": "AK",
    "Arizona": "AZ",
    "Arkansas": "AR",
    "California": "CA",
    "Colorado": "CO",
    "Connecticut": "CT",
    "Delaware": "DE",
    "Florida": "FL",
    "Georgia": "GA",
    "Hawaii": "HI",
    "Idaho": "ID",
    "Illinois": "IL",
    "Indiana": "IN",
    "Iowa": "IA",
    "Kansas": "KS",
    "Kentucky": "KY",
    "Louisiana": "LA",
    "Maine": "ME",
    "Maryland": "MD",
    "Massachusetts": "MA",
    "Michigan": "MI",
    "Minnesota": "MN",
    "Mississippi": "MS",
    "Missouri": "MO",
    "Montana": "MT",
    "Nebraska": "NE",
    "Nevada": "NV",
    "New Hampshire": "NH",
    "New Jersey": "NJ",
    "New Mexico": "NM",
    "New York": "NY",
    "North Carolina": "NC",
    "North Dakota": "ND",
    "Ohio": "OH",
    "Oklahoma": "OK",
    "Oregon": "OR",
    "Pennsylvania": "PA",
    "Rhode Island": "RI",
    "South Carolina": "SC",
    "South Dakota": "SD",
    "Tennessee": "TN",
    "Texas": "TX",
    "Utah": "UT",
    "Vermont": "VT",
    "Virginia": "VA",
    "Washington": "WA",
    "West Virginia": "WV",
    "Wisconsin": "WI",
    "Wyoming": "WY",
    "District of Columbia": "DC",
    "American Samoa": "AS",
    "Guam": "GU",
    "Northern Mariana Islands": "MP",
    "Puerto Rico": "PR",
    "United States Minor Outlying Islands": "UM",
    "U.S. Virgin Islands": "VI",
}

# Filter all the attacks in the US alone
df_attacks_us = df_attacks[df_attacks["Country"] == "United States"] 
df_attacks_us = pd.DataFrame(df_attacks_us.groupby("Province/State")["Event ID"].count())
df_attacks_us = df_attacks_us.reset_index()
df_attacks_us.rename(columns={"Province/State": "State", "Event ID": "Number of Terrorist Attacks"}, inplace=True)
df_attacks_us = df_attacks_us[df_attacks_us["State"] != "Unknown"]
df_attacks_us["State Code"] = df_attacks_us["State"].apply(lambda x: us_state_to_abbrev[x])

# Standardize the terrorism score between 0 and 1
def scale_column(df, column, minVal=float('-inf'), maxVal=float('inf')):
    if minVal == float('-inf'):
        minVal = min(df[column])
    if maxVal == float('inf'):
        maxVal = max(df[column])
    res = []
    for val in df[column]:
        res.append((val - minVal) / (maxVal - minVal))
    return res

# Count the number of terrorist attacks in each US state and standardize the number based on population
df_attacks_us = df_attacks_us.merge(df_us_population[['State', 'Population']])
df_attacks_us["Number of Terrorist Attacks (Standardised)"] = df_attacks_us["Number of Terrorist Attacks"] / df_attacks_us["Population"]
tempVal = scale_column(df_attacks_us, "Number of Terrorist Attacks (Standardised)")
df_attacks_us["Number of Terrorist Attacks (Standardised)"] = tempVal
df_attacks_us = df_attacks_us.sort_values(by="Number of Terrorist Attacks (Standardised)", ascending=False)

# Plot the choropleth for terrorist attacks in the US states
fig = px.choropleth(df_attacks_us, locations='State Code', color='Number of Terrorist Attacks (Standardised)',
                    color_continuous_scale="Viridis",
                    locationmode="USA-states", 
                    scope="usa",
                    labels={'Number of Terrorist Attacks (Standardised)':'No. of Terrorist Attacks'},
                    title="Terrorist Attacks in the US (1970-2017)")
fig.update_layout(title_x=0.44)
fig.update_layout( legend = {"xanchor": "right", "x": -0, "y":1.9})
fig.update_layout(height=500, width=780)
fig.show()

Figure 2: Terrorist Attacks in the US

Figure 2 shows different states in the US with a varying number of terrorist attacks, which has been calculated by dividing the total number of terrorist attacks in a given state by its population and standardizing in such a way the state with the highest score is assigned a value of 1 and the least score is assigned a value of 0. We see that New York, Oregon, California, Washington, and Nebraska are the five most terrorist-prone states in the US and Kentucky, South Carolina, West Virginia, Alaska, and Arkansas are the safest states in terms of the frequency of terrorist attacks.

So what exactly motivates these terrorist groups and has it changed over the last fifty years?

Code
# Create two word clouds showing the motives of the terrorist attacks.
# First, download the stopwords and add common words from the motives column
stpwrd = nltk.corpus.stopwords.words('english')
extended_list = ["specific",  "motive", "unknown", "Unknown", "incident", "claimed", "responsibility", "however", "unaffiliated", "individual", "identified", "killed", "stated", "anti", "attacks", "protest", "carried", "attack", "trend", "larger", "may", "part", "following", "community", "sources", "violence", "targeting", "noted", "posited", "suspected", "targeting", "members", "noted", "targeted", "also", "assailant", "perpetrator", "meant", "bring attention", "practice", "perpetrator", "assailant", "meant", "bring", "attention"]
stpwrd.extend(extended_list)

# Select all the attacks in the US
df_attacks_us = df_attacks[df_attacks["Country"] == "United States"]
df_attacks_us = df_attacks_us[["Year", "Motive"]]
df_attacks_us = df_attacks_us.dropna()

# Select the subset of the dataset above to only include the years between 1970 and 1999 inclusive.
temp_df = df_attacks_us[(df_attacks_us["Year"] >= 1970) & (df_attacks_us["Year"] < (2000))]
motive = list(temp_df["Motive"].values)
motive = " ".join(motive)

# Plot the word cloud
wordcloud = WordCloud(width=1000, height=800,
                background_color ='white',
                stopwords=stpwrd,
                color_func=lambda *args, **kwargs: "green",
                min_font_size = 10).generate(motive)
plt.figure(figsize = (12, 12), facecolor = None) 
plt.imshow(wordcloud) 
plt.axis("off")
plt.tight_layout(pad = 2)
plt.title("Attack Motives (" + str(1970) + " - " + str(1999) + ")", fontdict={'fontsize': 36})
plt.show()


# Select the subset of the dataset above to only include the years between 2000 and 2017 inclusive. 
temp_df = df_attacks_us[(df_attacks_us["Year"] >= 2000) & (df_attacks_us["Year"] <= (2017))]
motive = list(temp_df["Motive"].values)
motive = " ".join(motive)

# Plot the word cloud
wordcloud = WordCloud(width=1000, height=800,
                background_color ='white', 
                stopwords=stpwrd,
                color_func=lambda *args, **kwargs: "purple",
                min_font_size = 10).generate(motive)
plt.figure(figsize = (12, 12), facecolor = None) 
plt.imshow(wordcloud) 
plt.axis("off")
plt.tight_layout(pad = 2)
plt.title("Attack Motives (" + str(2000) + " - " + str(2017) + ")", fontdict={'fontsize': 36})
plt.show()

(a) 1970-1999

(b) 2000-2017

Figure 3: Attack Motives in the US

Both the word clouds share a common theme of abortion, suggesting that this has been a prominent topic of discussion and conflict for several decades. However, the word-clouds also differ in significant ways. The first word-cloud, which pertains to the pre-2000 period, reveals issues that were relevant to Puerto Rico, Vietnam, and African American groups. The second word-cloud, which represents the post-2000 period, shows themes that are related to Iraq, ISIL, and Islamic states. These topics suggest that there has been an increase in religiously motivated attacks over the past 20 years. This shift in topics also reflects changes in the political landscape both domestically and internationally from fighting against the spread of communism and racism to religiously motivated terrorism.

Now that we have analyzed the state of terrorism in the US, how about we now move to get its bigger picture? Let’s begin by analyzing how the frequency of terrorist attacks has changed over the last 50 years.

Code
# Count the number of terrorist attacks each year and plot a bar chart.
yearly_freq = pd.DataFrame(df_attacks.groupby("Year")["Event ID"].count()).reset_index()
yearly_freq.rename(columns={"Event ID": "Number of Terrorist Attacks"}, inplace=True)
fig = px.bar(yearly_freq, x=yearly_freq["Year"], y=yearly_freq["Number of Terrorist Attacks"], title="Frequency of Terrorist Attacks (1970-2017)")
fig.update_layout(title_x=0.5)
fig.update_layout(height=400)
fig.update_layout({
    'plot_bgcolor': 'rgba(0,0,0,0)',
    'paper_bgcolor': 'rgba(0,0,0,0)'
})
fig.show()

Figure 4: Frequency of Terrorist Attacks

It is clear from Figure 4 that the number of terrorist attacks was at its minimum around the years 1972 and 2003 (it is worth mentioning that the data for 1994 was missing and not 0) and has greatly increased over the last decade.

But, what parts of the world have experienced the highest number of terrorist attacks?

Code
# Count the number of terrorist attacks in each geographical regions and group them based on target type.
region_freq = pd.DataFrame(df_attacks.groupby(["Region", "Attack Type"])["Event ID"].count()).reset_index()
region_freq.rename(columns={"Event ID": "Number of Terrorist Attacks"}, inplace=True)
region_freq = region_freq.sort_values(by="Number of Terrorist Attacks", ascending=False)
region_freq['Attack Type'] = region_freq['Attack Type'].replace(['Bombing/Explosion', 'Hostage Taking (Kidnapping)', 'Facility/Infrastructure Attack', 'Hostage Taking (Barricade Incident)'], ['Bombing', 'Hostage', 'Facility Attack', 'Hostage (Barr.)'])

# Plot the bar chart.
fig = px.bar(region_freq, x=region_freq["Region"], y=region_freq["Number of Terrorist Attacks"], color="Attack Type", height=400, title="Terrorist Attacks in Different Regions", barmode="relative")
fig.update_layout(title_x=0.5)
fig.update_layout(height=500)
fig.update_layout({
    'plot_bgcolor': 'rgba(0,0,0,0)',
    'paper_bgcolor': 'rgba(0,0,0,0)'
})
fig.show()

Figure 5: Terrorist Attacks in different Regions

Figure 5 shows that the Middle East & North Africa, South Asia, and South America were the three most terrorist-prone regions. On the other hand, Australasia & Oceania, Central Asia, and East Asia were the safest regions in terms of terrorism. It is also worth noting that in all the geographical regions, the terrorist groups used bombing and armed assault as the most common form of attacks.

Let’s delve deeper to see which countries from these terrorist-prone regions were contributing the highest number of terrorist incidents.

Code
# Count the number of casualties in each country.
df_countries_casualties = pd.DataFrame(df_attacks.groupby(["Country"])["Casualties"].sum().reset_index())
df_countries_terrorist_count = pd.DataFrame(df_attacks.groupby(["Country"])["Event ID"].count().reset_index())
df_countries_terrorist_count.rename(columns={"Event ID": "Number of Terrorist Attacks"}, inplace=True)
df_merged_casualties_count = df_countries_casualties.merge(df_countries_terrorist_count[["Country", "Number of Terrorist Attacks"]])

# Map the country names to their ISO codes
df_iso_codes = px.data.gapminder()[["country", "iso_alpha"]]
df_iso_codes.rename(columns={"country": "Country", "iso_alpha": "Country Code"}, inplace=True)
df_iso_codes.drop_duplicates(inplace=True)
df_iso_codes = df_iso_codes.reset_index()
df_iso_codes.drop(["index"], axis=1, inplace=True)
df_countries_terrorist_count = df_countries_terrorist_count.merge(df_iso_codes[['Country', 'Country Code']])

# Plot a choropleth representing the frequency of attacks in different countries.
fig = px.choropleth(df_countries_terrorist_count, locations="Country Code",
                    color="Number of Terrorist Attacks",
                    hover_name="Country",
                    color_continuous_scale=px.colors.sequential.Plasma,
                    title="Terrorist Attacks (1970 - 2017)")
fig.update_layout(title_x=0.44)
fig.update_layout(height=500, width=880)
fig.show()

Figure 6: Countries with the Highest Number of Attacks

Figure 6 shows that Iraq in the Middle East; Afghanistan, Pakistan, and India in South Asia, and Colombia in South America were the most terrorist-prone countries.

The analysis of global terrorism is incomplete without information on terrorist groups. How about to we visualize the top 15 most notorious terrorist groups based on the number of casualties from their attacks?

Code
# Count the number of casulaties for different terrorist groups
groupwise_casualty_freq = pd.DataFrame(df_attacks.groupby("Terrorist Group")["Casualties"].sum()).reset_index()
groupwise_casualty_freq = groupwise_casualty_freq.sort_values(by="Casualties", ascending=False)[:16]
notorious_groups = list(groupwise_casualty_freq["Terrorist Group"])
notorious_groups.remove("Unknown")
df_notorious_groups = df_attacks[df_attacks["Terrorist Group"].isin(notorious_groups)]
df_notorious_groups = pd.DataFrame(df_notorious_groups.groupby(["Terrorist Group", "Year"])["Casualties"].sum().reset_index())
df_notorious_groups["Terrorist Group"] = df_notorious_groups["Terrorist Group"].replace(["Farabundo Marti National Liberation Front (FMLN)", "Islamic State of Iraq and the Levant (ISIL)", "Kurdistan Workers' Party (PKK)", "Liberation Tigers of Tamil Eelam (LTTE)", "New People's Army (NPA)", "Nicaraguan Democratic Force (FDN)", "Revolutionary Armed Forces of Colombia (FARC)", "Shining Path (SL)", "Tehrik-i-Taliban Pakistan (TTP)"], ["Farbundo Liberation", "ISIL", "Kurdistan W.", "Tamil Tigers", "New People's Army", "Nicaraguan Force", "Colombian Force", "Shining Path", "Taliban Pakistan"])

# Plot a line chart for the 15 terrorist groups with the highest number of casualties
fig = px.line(df_notorious_groups, x="Year", y="Casualties", color="Terrorist Group", title='Attacks by different Terrorist Groups')
fig.update_layout(title_x=0.5)
fig.update_layout(height=500)
fig.update_layout({
    'plot_bgcolor': 'rgba(0,0,0,0)',
    'paper_bgcolor': 'rgba(0,0,0,0)'
})
fig.show()

Figure 7: Attacks by Different Terrorist Groups

One cannot fail to notice the peak in 2001 for Al Qaida, which is widely taken as the beginning of the rise of other Islamic religious extremist terrorist groups like Taliban, Al-Shabaab, and Boko Haram. Taliban, Boko Haram, and ISIL, as evident from the steep lines after 2010 in Figure 7, appear to have killed more people than all the other 12 terrorist groups combined in the last 50 years.

So what exactly do these terrorist groups target? Let’s find out.

Code
# Select the most common targets of the terrorist groups.
TOP_N = 11
target_freq = pd.DataFrame(df_attacks.groupby("Target Type")["Event ID"].count()).reset_index()
target_freq.rename(columns={"Event ID": "Number of Terrorist Attacks"}, inplace=True)
rem_freq = target_freq.sort_values(by="Number of Terrorist Attacks", ascending=False)[TOP_N:]
target_freq = target_freq.sort_values(by="Number of Terrorist Attacks", ascending=False)[:TOP_N]
target_freq = target_freq[target_freq['Target Type'] != "Unknown"]

# Plot a bar chart.
fig = px.bar(target_freq, x='Target Type', y='Number of Terrorist Attacks', title="Common Targets of Terrorist Attacks")
fig.update_layout(title_x=0.5)
fig.update_layout(height=500)
fig.update_layout({
    'plot_bgcolor': 'rgba(0,0,0,0)',
    'paper_bgcolor': 'rgba(0,0,0,0)'
})
fig.show()

Figure 8: Common Targets of Terrorist attacks

Most of the attacks have been targeted toward private citizens & property, the military, and the police. Private citizens & properties are generally the easiest groups to be attacked and also the group with highest population. This might be one possible explanation for such a high number of attacks on them.

Now, let’s change our direction a little bit. Let’s analyze how terrorism is related to socio-economic factors like GDP and fertility rate.

Code
# Maps a country to its geographical region
def map_region(country):
    region = list(df_attacks[df_attacks["Country"] == country]["Region"])[0]
    return region

# Find the top 5 countries with the highest number of terrorist attacks
country_freq = pd.DataFrame(df_attacks.groupby("Country")["Event ID"].count()).reset_index()
country_freq.rename(columns={"Event ID": "Number of Terrorist Attacks"}, inplace=True)
country_freq = country_freq.sort_values(by="Number of Terrorist Attacks", ascending=False)[:10]
country_freq["Region"] = country_freq["Country"].apply(map_region)
top_five_countries = list(country_freq["Country"].values)[:5]

# Count the yearly frequency of terrorist attacks of the top five countries.
country_freq_year = pd.DataFrame(df_attacks.groupby(["Year", "Country"])["Event ID"].count().reset_index())
country_freq_year = country_freq_year[country_freq_year["Country"].isin(top_five_countries)]
country_freq_year.rename(columns={"Event ID": "Number of Terrorist Attacks"}, inplace=True)

# Select only the attacks from the top five countries.
df_terrorist_gdp = df_gdp[(df_gdp["Country"].isin(top_five_countries)) & ((df_gdp["Year"] >= 1970) & (df_gdp["Year"] <= 2017))]
df_all_gdp = df_gdp[((df_gdp["Year"] >= 1970) & (df_gdp["Year"] <= 2017))]
df_all_gdp = df_all_gdp.dropna()
df_all_gdp = pd.DataFrame(df_all_gdp.groupby("Year").mean().reset_index())
df_all_gdp.rename(columns={"GDP (in USD)": "World"}, inplace=True)

# Assign a specific color to each of the countries. World will take the black color.
colorList = list(px.colors.qualitative.T10)
if colorList[0] != "black":
    colorList.insert(0, "black")
for country in top_five_countries:
    temp_gdp = df_terrorist_gdp[df_terrorist_gdp["Country"] == country]
    df_all_gdp[country] = list(temp_gdp["GDP (in USD)"])

# Plot a line chart showing the GDP of the top five countries and the world.
fig = px.line(df_all_gdp, x='Year', y=df_all_gdp.columns[1:], title="GDP of Terrorist-prone Countries", color_discrete_sequence=colorList, labels={
                     "value": "GDP (in USD)",
                     "variable": ""
                 })
fig.update_layout({
    'plot_bgcolor': 'rgba(0,0,0,0)',
    'paper_bgcolor': 'rgba(0,0,0,0)'
})
fig.update_layout(title_x=0.5)
fig.update_layout(height=400, width=800)
fig.show()

# Select the fertility rate of the top five countries. 
df_all_fertility = df_population[(df_population["Year"] >= 1970) & (df_population["Year"] <= 2017)]
df_terrorist_fertility = df_population[(df_population["Country"].isin(top_five_countries)) & ((df_population["Year"] >= 1970) & (df_population["Year"] <= 2017))]
df_all_fertility = df_all_fertility.dropna()
df_all_fertility = df_all_fertility.drop(['Migrants (net)'], axis=1)
df_all_fertility = pd.DataFrame(df_all_fertility.groupby("Year").mean().reset_index())
df_all_fertility.rename(columns={"Fertility Rate": "World"}, inplace=True)
for country in top_five_countries:
    temp_fertility = df_terrorist_fertility[df_terrorist_fertility["Country"] == country]
    df_all_fertility[country] = list(temp_fertility["Fertility Rate"])

# Plot a line chart showing the fertility rate of the top five countries along with the same for the world.
fig = px.line(df_all_fertility, x='Year', y=df_all_fertility.columns[1:], title="Fertility Rate of Terrorist-prone Countries", color_discrete_sequence=colorList, labels={
                     "value": "Fertility Rate",
                     "variable": ""
                 })
fig.update_layout({
    'plot_bgcolor': 'rgba(0,0,0,0)',
    'paper_bgcolor': 'rgba(0,0,0,0)'
})
fig.update_layout(title_x=0.5)
fig.update_layout(height=400, width=800)
fig.show()

(a) GDP

(b) Fertility Rate

Figure 9: Socio-economic Aspects of Terrorist-prone Countries

Figure 9 shows the GDP and fertility rate of the aforementioned five-most terrorist-prone countries. We can clearly see from the graphs above that all these countries generally have a lower GDP and higher fertility rate compared to the global average in the given period. India is an exception to have its GDP increase at a faster rate than the global average. Similarly, Colombia is an exception to have its fertility rate below the global average right from the 1980s.

Machine and Deep Learning

Code
# Remove the columns we will not be using for the modeling.
try:
    del df_attacks["Event ID"]
    del df_attacks["Motive"]
    del df_attacks["Latitude"]
    del df_attacks["Longitude"]
except:
    print("Some of the columns are not present")
df_attacks = df_attacks.dropna()
df_attacks[['Country', 'Region', 'Province/State', 'City', 'Attack Type', 'Target Type', 'Terrorist Group', 'Weapon Type']] = df_attacks[['Country', 'Region', 'Province/State', 'City', 'Attack Type', 'Target Type', 'Terrorist Group', 'Weapon Type']].apply(LabelEncoder().fit_transform)

# Split into predictor and response variables.
y = df_attacks["Casualties"]
X = df_attacks.drop(['Casualties'], axis=1)

# Split the data into train (70%), validation (15%), and test (15%) sets
X_trainval, X_test, y_trainval, y_test = train_test_split(X, y, test_size=0.15, random_state=42)
X_train, X_val, y_train, y_val = train_test_split(X_trainval, y_trainval, test_size=0.20, random_state=42)

# Scale the dataset.
scaler = RobustScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.fit_transform(X_test)
X_val = scaler.fit_transform(X_val)


# Neural Networks
def create_bilstm():
    model = Sequential()
    model.add(Bidirectional(LSTM(128, activation='relu', input_shape=(12,1), return_sequences=True)))
    model.add(Dropout(0.2))
    model.add(Bidirectional(LSTM(64, activation='relu')))
    model.add(Dropout(0.2))
    model.add(Dense(32, activation='relu'))
    model.add(Dense(1))
    return model

def create_ffnn():
    model = Sequential()
    model.add(Dense(128, activation='relu', input_shape=(12,)))
    model.add(Dropout(0.3))
    model.add(Dense(64, activation='relu'))
    model.add(Dropout(0.2))
    model.add(Dense(32, activation='sigmoid'))
    model.add(Dense(16, activation='tanh'))
    model.add(Dense(1))
    return model

def create_cnn():
    model = Sequential()
    model.add(Conv1D(32, 3, activation='relu', input_shape=(12,1)))
    model.add(MaxPooling1D(2))
    model.add(Conv1D(64, 3, activation='relu'))
    model.add(MaxPooling1D(2))
    model.add(Flatten())
    model.add(Dense(64, activation='relu'))
    model.add(Dense(1))
    return model

def create_gru():
    model = Sequential()
    model.add(GRU(64, activation='tanh', input_shape=(12,1)))
    model.add(Dropout(0.2))
    model.add(Dense(32, activation='tanh'))
    model.add(Dropout(0.2))
    model.add(Dense(1, activation='linear'))
    return model

# Result container
result = []
dlModels = {"Feed Forward NN": create_ffnn(), "CNN": create_cnn(), "GRU": create_gru(), "Bi-LSTM": create_bilstm()}


# Reshape the train, test and validation sets.
X_train_new = X_train.reshape(X_train.shape[0], X_train.shape[1], 1)
X_val_new = X_val.reshape(X_val.shape[0], X_val.shape[1], 1)
X_test_new = X_test.reshape(X_test.shape[0], X_test.shape[1], 1)

# Train the neural networks one at a time.
for name, model in dlModels.items():
    start_time = time.time()
    model.compile(optimizer='adam', loss='mse')
    if name == "Bi-LSTM":
        model.fit(X_train_new, y_train, epochs=20, batch_size=128, validation_data=(X_val_new, y_val))
        y_pred = model.predict(X_test_new)
    else:
        model.fit(X_train, y_train, epochs=20, batch_size=128, validation_data=(X_val, y_val))
        y_pred = model.predict(X_test)
    result.append([name, round(np.sqrt(mean_squared_error(y_test, y_pred)), 2), round(time.time() - start_time, 2)])

# Train the machine learning models one at a time.
mlModels = {"Random Forest": RandomForestRegressor(), "K Neighbors": neighbors.KNeighborsRegressor(), "Decision Trees": DecisionTreeRegressor()}
for name, model in mlModels.items():
    start_time = time.time()
    model.fit(X_train, y_train)
    pred = model.predict(X_test)
    result.append([name, round(np.sqrt(mean_squared_error(y_test, pred)), 2), round(time.time() - start_time, 2)])

# Save the results in a csv file.
pd.options.display.float_format = '{:.2f}'.format
result_df = pd.DataFrame(result, columns=["Model", "Root Mean Squared Error", "Time (in seconds)"])
result_df.to_csv("./results.csv")  

Finally, let’s take the machine and deep learning algorithms out of our arsenals and tackle the problem of predicting the number of casualties for any given attack based on the date, country, region, state, city, suicidal intent, type, target type, terrorist group, and weapon used in the attack. We believe such a model will be useful for intelligence groups to assess the severity of attacks and prepare for them in the future.

The dataset was split into train, validation, and test sets in the ratio 70:15:15. The train and validation sets were used during the training phase and the test set was used for assessing the efficiency of the models based on the time they take and their root-mean-squared(RMS) error. The results are shown in Figure 10.

Code
# Read the results from the csv.
result_df = pd.read_csv("../results/results.csv")
result_df = result_df.sort_values(by=['Root Mean Squared Error'])

# Plot the data.
matplotlib.rc_file_defaults()
ax1 = sns.set_style(style=None, rc=None)
fig, ax1 = plt.subplots(figsize=(12,6))
colors = ["#5D3FD3", "#5D3FD3", "#5D3FD3","#5D3FD3", "#0096FF", "#0096FF", "#0096FF"]

# Plot the bar chart and set figure options.
sns.barplot(data = result_df, x='Model', y='Root Mean Squared Error', alpha=0.5, ax=ax1, palette=colors)
ax1.set_xticklabels(ax1.get_xticklabels(), fontsize=12)
ax1.set_xlabel("Models", fontsize=14)
ax1.set_ylabel("Root Mean Squared Error", fontsize=14)
ax1.set_title("Efficiency of Models", fontsize=16)

# Plot the lineplot on the same chart and change the alpha level of the charts.
ax2 = ax1.twinx()
ax2.set_ylabel("Time (in seconds)", fontsize=14)
dl = mpatches.Patch(color="#5D3FD3")
ml = mpatches.Patch(color="#0096FF")
custom_line = [Line2D([0], [0], color='#0096FF', lw=2), dl, ml]
leg = plt.legend(custom_line, ["Time", "DL Models", "ML Models"], loc="upper left")
for index, lh in enumerate(leg.legendHandles): 
    if index > 0:
        lh.set_alpha(0.5)
sns.lineplot(data = list(result_df["Time (in seconds)"]), marker='o', ax=ax2, color='#0096FF')
plt.show()

Figure 10: Efficiency of Models

Feed Forward Neural Network turned out to be the most effective model, achieving an RMS error of 8.68, and Decision Trees was the fastest model, completing prediction in 0.99 seconds. In general, neural networks had lower RMS error than other machine learning models but they were also slower to train and test than their machine learning counterparts.

Our analysis ends here for now but in the future, we will explore multiple other variables in the terrorism database along with different other socio-economic factors and their relationship with terrorist attacks. We will also perform extensive hyperparameter tuning and train more sophisticated models like different variants of FractalNets, ResNets, and XceptionNets on a larger dataset, combining and feature engineering different socio-economic factors to achieve the lowest possible RMSE score.

More Animations

Before you go, here’s a little treat for your eyes.

Code
# Code for bar chart race for countries with highest number of terrorist attacks.
# Yearly data is found for each of the countries.
df_countries_pivot = pd.DataFrame(df_attacks.groupby(["Country", "Year"]).count()).reset_index()
df_countries_pivot.rename(columns={"Event ID": "Number of Terrorist Attacks"}, inplace=True)
df_countries_pivot = df_countries_pivot.pivot_table(values = 'Number of Terrorist Attacks',index = ['Year'], columns = 'Country')
df_countries_pivot.fillna(0, inplace=True)
df_countries_pivot.sort_values(list(df_countries_pivot.columns),inplace=True)
df_countries_pivot = df_countries_pivot.sort_index()
df_countries_pivot.iloc[:, 0:-1] = df_countries_pivot.iloc[:, 0:-1].cumsum()
bcr.bar_chart_race(df = df_countries_pivot,
                   n_bars = 10,
                   period_length=1000,
                   sort='desc',
                   title="Countries with the Highest Number of Terrorist Attacks",
                   filter_column_colors=True,
                   filename = None)

# Code for bar chart race for terrorist attacks based on geographical regions.
df_region_pivot = pd.DataFrame(df_attacks.groupby(["Region", "Year"]).count()).reset_index()
df_region_pivot.rename(columns={"Event ID": "Number of Terrorist Attacks"}, inplace=True)
df_region_pivot = df_region_pivot.pivot_table(values = 'Number of Terrorist Attacks',index = ['Year'], columns = 'Region')
df_region_pivot.fillna(0, inplace=True)
df_region_pivot.sort_values(list(df_region_pivot.columns),inplace=True)
df_region_pivot = df_region_pivot.sort_index()
df_region_pivot.iloc[:, 0:-1] = df_region_pivot.iloc[:, 0:-1].cumsum()
bcr.bar_chart_race(df = df_region_pivot, 
                   n_bars = 12,
                   period_length=1000,
                   sort='desc',
                   title="Terrorist Attacks Based on Geographical Regions",
                   filter_column_colors=True,
                   filename = None)

# Code for bar chart race for terrorist groups with the highest number of attacks.
df_animation_pivot = df_notorious_groups.pivot_table(values = 'Casualties',index = ['Year'], columns = 'Terrorist Group')
df_animation_pivot.fillna(0, inplace=True)
df_animation_pivot.sort_values(list(df_animation_pivot.columns),inplace=True)
df_animation_pivot = df_animation_pivot.sort_index()
df_animation_pivot = df_animation_pivot.drop(columns=["Unknown"])
df_animation_pivot.iloc[:, 0:-1] = df_animation_pivot.iloc[:, 0:-1].cumsum()    
bcr.bar_chart_race(df = df_animation_pivot, 
                   n_bars = 10, 
                   period_length=1000,
                   sort='desc',
                   title="Terrorist Groups with the Highest Number of Attacks",
                   filename = None)

References

Countries in the world by population (2023). Worldometer. Retrieved February 5,
    2023, from https://www.worldometers.info/world-population/population-by-country/

Information on more than 200,000 terrorist attacks. Global Terrorism Database.
   Retrieved February 5, 2023, from https://www.start.umd.edu/gtd/

Lai, N. T. C. (2023, February 3). Word population (1955-2020). Kaggle. Retrieved February
   5, 2023, from https://www.kaggle.com/datasets/nguyenthicamlai/population-2022

Mishinev, T. (2022, September 9). World, region, country GDP/GDP per capita. Kaggle.
   Retrieved February 5, 2023, from
   https://www.kaggle.com/datasets/tmishinev/world-country-gdp-19602021

National Consortium for the Study of Terrorism and Responses to Terrorism. Global
   terrorism database. Kaggle. Retrieved February 5, 2023, from
   https://www.kaggle.com/datasets/START-UMD/gtd

World Bank. GDP (current US$). GDP National Accounts. Retrieved February 5, 2023, from
   https://data.worldbank.org/indicator/NY.GDP.MKTP.CD